Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and Variance Reduction
نویسندگان
چکیده
Asynchronous Q-learning aims to learn the optimal action-value function (or Q-function) of a Markov decision process (MDP), based on single trajectory Markovian samples induced by behavior policy. Focusing $\gamma $ -discounted MDP with state space notation="LaTeX">$\mathcal {S}$ and action {A}$ , we demonstrate that notation="LaTeX">$\ell _{\infty }$ -based sample complexity classical asynchronous — namely, number needed yield an entrywise notation="LaTeX">$\varepsilon -accurate estimate Q-function is at most order notation="LaTeX">$\frac {1}{ \mu _{\mathsf {min}}(1-\gamma)^{5}\varepsilon ^{2}}+ \frac { t_{\mathsf {mix}}}{ {min}}(1-\gamma)}$ up some logarithmic factor, provided proper constant learning rate adopted. Here, notation="LaTeX">$t_{\mathsf {mix}}$ notation="LaTeX">$\mu {min}}$ denote respectively mixing time minimum state-action occupancy probability trajectory. The first term this bound matches in synchronous case independent drawn from stationary distribution second reflects cost taken for empirical reach steady state, which incurred very beginning becomes amortized as algorithm runs. Encouragingly, above improves upon state-of-the-art result factor least notation="LaTeX">$|\mathcal {S}||\mathcal {A}|$ all scenarios, {mix}}|\mathcal any sufficiently small accuracy level . Further, scaling effective horizon {1}{1-\gamma can be improved means variance reduction.
منابع مشابه
the effect of task complexity on lexical complexity and grammatical accuracy of efl learners’ argumentative writing
بر اساس فرضیه شناخت رابینسون (2001 و 2003 و 2005) و مدل ظرفیت توجه محدود اسکهان (1998)، این تحقیق تاثیر پیچیدگی تکلیف را بر پیچیدگی واژگان و صحت گرامری نوشتار مباحثه ای 60 نفر از دانشجویان زبان انگلیسی بررسی کرد. میزان پیچیدگی تکلیف از طریق فاکتورهای پراکندگی-منابع تعیین شد. همه ی شرکت کنندگان به صورت نیمه تصادفی به یکی از سه گروه: (1) گروه موضوع، (2) گروه موضوع + اندیشه و (3) گروه موضوع + اندی...
15 صفحه اولVariance and sample size calculations in quality-of-life--adjusted survival analysis (Q-TWiST).
The Quality-Adjusted Time Without Symptoms or Toxicity (Q-TWiST) statistic previously introduced by Glasziou, Simes and Gelber (1990, Statistics in Medicine 9, 1259-1276) combines toxicity, disease-free survival, and overall survival information in assessing the impact of treatments on the lives of patients. This methodology has received positive reviews from clinicians as intuitive and useful,...
متن کاملParallel Asynchronous Stochastic Variance Reduction for Nonconvex Optimization
Nowadays, asynchronous parallel algorithms have received much attention in the optimization field due to the crucial demands for modern large-scale optimization problems. However, most asynchronous algorithms focus on convex problems. Analysis on nonconvex problems is lacking. For the Asynchronous Stochastic Descent (ASGD) algorithm, the best result from (Lian et al., 2015) can only achieve an ...
متن کاملAsynchronous Doubly Stochastic Proximal Optimization with Variance Reduction
In the big data era, both of the sample size and dimension could be huge at the same time. Asynchronous parallel technology was recently proposed to handle the big data. Specifically, asynchronous stochastic (variance reduction) gradient descent algorithms were recently proposed to scale the sample size, and asynchronous stochastic coordinate descent algorithms were proposed to scale the dimens...
متن کاملZeroth-order Asynchronous Doubly Stochastic Algorithm with Variance Reduction
Zeroth-order (derivative-free) optimization attracts a lot of attention in machine learning, because explicit gradient calculations may be computationally expensive or infeasible. To handle large scale problems both in volume and dimension, recently asynchronous doubly stochastic zeroth-order algorithms were proposed. The convergence rate of existing asynchronous doubly stochastic zeroth order ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Transactions on Information Theory
سال: 2022
ISSN: ['0018-9448', '1557-9654']
DOI: https://doi.org/10.1109/tit.2021.3120096